By: Nanda Akhirdianto
Supply chain analysis is the process of collecting, analyzing, and interpreting data about a company's supply chain in order to improve its efficiency and effectiveness. This can involve a variety of data, such as product types, SKUs, prices, availability, number of products sold, revenue generated, customer demographics, stock levels, lead times, order quantities, shipping times, shipping carriers, shipping costs, supplier names, locations, lead times, production volumes, manufacturing lead time, manufacturing costs, inspection results, defect rates, transportation modes, routes, and costs.
The purpose of the project data is to use it to perform supply chain analysis and identify opportunities for improvement. The data can be used to answer a variety of questions, such as:
The answers to these questions can help us make better decisions about our supply chain, such as:
We got the dataset that we used here, and this project was inspired by the data analysis project by Aman Kharwal. The dataset information is as follows:
This dataset provides a comprehensive overview of a company's supply chain. By analyzing this data, businesses can identify opportunities to improve their efficiency and effectiveness, and ultimately, their bottom line.
My role in this project is as a Data Analyst, who is responsible for carrying out the following tasks, including data interpretation, data transformation, data cleaning, and conducting descriptive analysis and visualization to get the insights needed.
In this first process, we need to define all the tools to execute this project. We are using Jupyter Notebook that has built in Python as well as functional libraries to work with data, such as Pandas, NumPy, Plotly, Matplotlib and more, and also it super easy to use. We can definitely see the result once we run our script on each cell.
# Import all necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
%matplotlib notebook
# Load data
data = pd.read_csv('Downloads/supply_chain_data.csv')
data.head()
| Product type | SKU | Price | Availability | Number of products sold | Revenue generated | Customer demographics | Stock levels | Lead times | Order quantities | ... | Location | Lead time | Production volumes | Manufacturing lead time | Manufacturing costs | Inspection results | Defect rates | Transportation modes | Routes | Costs | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | haircare | SKU0 | 69.808006 | 55 | 802 | 8661.996792 | Non-binary | 58 | 7 | 96 | ... | Mumbai | 29 | 215 | 29 | 46.279879 | Pending | 0.226410 | Road | Route B | 187.752075 |
| 1 | skincare | SKU1 | 14.843523 | 95 | 736 | 7460.900065 | Female | 53 | 30 | 37 | ... | Mumbai | 23 | 517 | 30 | 33.616769 | Pending | 4.854068 | Road | Route B | 503.065579 |
| 2 | haircare | SKU2 | 11.319683 | 34 | 8 | 9577.749626 | Unknown | 1 | 10 | 88 | ... | Mumbai | 12 | 971 | 27 | 30.688019 | Pending | 4.580593 | Air | Route C | 141.920282 |
| 3 | skincare | SKU3 | 61.163343 | 68 | 83 | 7766.836426 | Non-binary | 23 | 13 | 59 | ... | Kolkata | 24 | 937 | 18 | 35.624741 | Fail | 4.746649 | Rail | Route A | 254.776159 |
| 4 | skincare | SKU4 | 4.805496 | 26 | 871 | 2686.505152 | Non-binary | 5 | 3 | 56 | ... | Delhi | 5 | 414 | 3 | 92.065161 | Fail | 3.145580 | Air | Route A | 923.440632 |
5 rows × 24 columns
data.tail()
| Product type | SKU | Price | Availability | Number of products sold | Revenue generated | Customer demographics | Stock levels | Lead times | Order quantities | ... | Location | Lead time | Production volumes | Manufacturing lead time | Manufacturing costs | Inspection results | Defect rates | Transportation modes | Routes | Costs | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 95 | haircare | SKU95 | 77.903927 | 65 | 672 | 7386.363944 | Unknown | 15 | 14 | 26 | ... | Mumbai | 18 | 450 | 26 | 58.890686 | Pending | 1.210882 | Air | Route A | 778.864241 |
| 96 | cosmetics | SKU96 | 24.423131 | 29 | 324 | 7698.424766 | Non-binary | 67 | 2 | 32 | ... | Mumbai | 28 | 648 | 28 | 17.803756 | Pending | 3.872048 | Road | Route A | 188.742141 |
| 97 | haircare | SKU97 | 3.526111 | 56 | 62 | 4370.916580 | Male | 46 | 19 | 4 | ... | Mumbai | 10 | 535 | 13 | 65.765156 | Fail | 3.376238 | Road | Route A | 540.132423 |
| 98 | skincare | SKU98 | 19.754605 | 43 | 913 | 8525.952560 | Female | 53 | 1 | 27 | ... | Chennai | 28 | 581 | 9 | 5.604691 | Pending | 2.908122 | Rail | Route A | 882.198864 |
| 99 | haircare | SKU99 | 68.517833 | 17 | 627 | 9185.185829 | Unknown | 55 | 8 | 59 | ... | Chennai | 29 | 921 | 2 | 38.072899 | Fail | 0.346027 | Rail | Route B | 210.743009 |
5 rows × 24 columns
# Get slight info from our data
print(data.info())
print(data.dtypes)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100 entries, 0 to 99 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Product type 100 non-null object 1 SKU 100 non-null object 2 Price 100 non-null float64 3 Availability 100 non-null int64 4 Number of products sold 100 non-null int64 5 Revenue generated 100 non-null float64 6 Customer demographics 100 non-null object 7 Stock levels 100 non-null int64 8 Lead times 100 non-null int64 9 Order quantities 100 non-null int64 10 Shipping times 100 non-null int64 11 Shipping carriers 100 non-null object 12 Shipping costs 100 non-null float64 13 Supplier name 100 non-null object 14 Location 100 non-null object 15 Lead time 100 non-null int64 16 Production volumes 100 non-null int64 17 Manufacturing lead time 100 non-null int64 18 Manufacturing costs 100 non-null float64 19 Inspection results 100 non-null object 20 Defect rates 100 non-null float64 21 Transportation modes 100 non-null object 22 Routes 100 non-null object 23 Costs 100 non-null float64 dtypes: float64(6), int64(9), object(9) memory usage: 18.9+ KB None Product type object SKU object Price float64 Availability int64 Number of products sold int64 Revenue generated float64 Customer demographics object Stock levels int64 Lead times int64 Order quantities int64 Shipping times int64 Shipping carriers object Shipping costs float64 Supplier name object Location object Lead time int64 Production volumes int64 Manufacturing lead time int64 Manufacturing costs float64 Inspection results object Defect rates float64 Transportation modes object Routes object Costs float64 dtype: object
# Getting to know how our data looks in statistical
data.describe()
| Price | Availability | Number of products sold | Revenue generated | Stock levels | Lead times | Order quantities | Shipping times | Shipping costs | Lead time | Production volumes | Manufacturing lead time | Manufacturing costs | Defect rates | Costs | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.000000 | 100.00000 | 100.000000 | 100.000000 | 100.000000 |
| mean | 49.462461 | 48.400000 | 460.990000 | 5776.048187 | 47.770000 | 15.960000 | 49.220000 | 5.750000 | 5.548149 | 17.080000 | 567.840000 | 14.77000 | 47.266693 | 2.277158 | 529.245782 |
| std | 31.168193 | 30.743317 | 303.780074 | 2732.841744 | 31.369372 | 8.785801 | 26.784429 | 2.724283 | 2.651376 | 8.846251 | 263.046861 | 8.91243 | 28.982841 | 1.461366 | 258.301696 |
| min | 1.699976 | 1.000000 | 8.000000 | 1061.618523 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.013487 | 1.000000 | 104.000000 | 1.00000 | 1.085069 | 0.018608 | 103.916248 |
| 25% | 19.597823 | 22.750000 | 184.250000 | 2812.847151 | 16.750000 | 8.000000 | 26.000000 | 3.750000 | 3.540248 | 10.000000 | 352.000000 | 7.00000 | 22.983299 | 1.009650 | 318.778455 |
| 50% | 51.239831 | 43.500000 | 392.500000 | 6006.352023 | 47.500000 | 17.000000 | 52.000000 | 6.000000 | 5.320534 | 18.000000 | 568.500000 | 14.00000 | 45.905622 | 2.141863 | 520.430444 |
| 75% | 77.198228 | 75.000000 | 704.250000 | 8253.976921 | 73.000000 | 24.000000 | 71.250000 | 8.000000 | 7.601695 | 25.000000 | 797.000000 | 23.00000 | 68.621026 | 3.563995 | 763.078231 |
| max | 99.171329 | 100.000000 | 996.000000 | 9866.465458 | 100.000000 | 30.000000 | 96.000000 | 10.000000 | 9.929816 | 30.000000 | 985.000000 | 30.00000 | 99.466109 | 4.939255 | 997.413450 |
In order to ensuring data cleanliness, we're conducting these steps to find if there's duplicate or any anomali in our data that we didn't capture in previous step.
# Review for duplicate and missing data
duplicate_rows = data[data.duplicated()]
missing_values = data.isnull().sum()
# Showing the result
print(duplicate_rows)
print(missing_values)
Empty DataFrame Columns: [Product type, SKU, Price, Availability, Number of products sold, Revenue generated, Customer demographics, Stock levels, Lead times, Order quantities, Shipping times, Shipping carriers, Shipping costs, Supplier name, Location, Lead time, Production volumes, Manufacturing lead time, Manufacturing costs, Inspection results, Defect rates, Transportation modes, Routes, Costs] Index: [] [0 rows x 24 columns] Product type 0 SKU 0 Price 0 Availability 0 Number of products sold 0 Revenue generated 0 Customer demographics 0 Stock levels 0 Lead times 0 Order quantities 0 Shipping times 0 Shipping carriers 0 Shipping costs 0 Supplier name 0 Location 0 Lead time 0 Production volumes 0 Manufacturing lead time 0 Manufacturing costs 0 Inspection results 0 Defect rates 0 Transportation modes 0 Routes 0 Costs 0 dtype: int64
In this process we're trying to answer these question to better understand how our data has done. The process will execute using some modules from pandas.
# What products are most popular?
popular = data.groupby('Product type')['Number of products sold'].sum().sort_values(ascending=False)
print('The most popular products are:')
print(popular)
# What are the most profitable products?
profitable = data.groupby('Product type')['Revenue generated'].sum().sort_values(ascending=False)
print('The most profitable products are:')
print(profitable)
# What are our stock levels?
stock_levels = data.groupby('Product type')['Stock levels'].sum()
print('Our stock level are:')
print(stock_levels)
The most popular products are: Product type skincare 20731 haircare 13611 cosmetics 11757 Name: Number of products sold, dtype: int64 The most profitable products are: Product type skincare 241628.162133 haircare 174455.390605 cosmetics 161521.265999 Name: Revenue generated, dtype: float64 Our stock level are: Product type cosmetics 1525 haircare 1644 skincare 1608 Name: Stock levels, dtype: int64
# What are our lead times?
lead_times = data.groupby('Product type')['Lead times'].mean()
print('Our lead times are:')
print(lead_times)
# How much are we spending o shipping?
shipping_costs = data['Shipping costs'].sum()
print('We are spending ${} on shipping.'.format(shipping_costs))
# What are our supplier costs?
supplier_costs = data.groupby('Supplier name')['Costs'].sum()
print("\nSupplier Costs:")
print(supplier_costs)
Our lead times are: Product type cosmetics 15.384615 haircare 15.529412 skincare 16.700000 Name: Lead times, dtype: float64 We are spending $554.8149072019588 on shipping. Supplier Costs: Supplier name Supplier 1 15520.980745 Supplier 2 11330.599298 Supplier 3 7032.001523 Supplier 4 9392.587517 Supplier 5 9648.409132 Name: Costs, dtype: float64
# What are our manufacturing costs?
manufacturing_costs = data["Manufacturing costs"].sum()
print("We are spending ${} on manufacturing.".format(manufacturing_costs))
# What are our defect rates?
defect_rates = data["Defect rates"].mean()
print("Our defect rate is {}%.".format(defect_rates * 100))
# How can we improve our transportation?
transportation_modes = data['Transportation modes'].unique()
transportation_strategies = [
"Using more efficient shipping carriers.",
"Negotiating better shipping rates with our carriers.",
"Consolidating our shipments to reduce shipping costs.",
"Using more sustainable transportation methods."
]
sorted_transportation_strategies = sorted(transportation_strategies)
sorted_optimal_transportation_strategy = "\n".join(sorted_transportation_strategies)
print("Recommended Transportation Strategy:\n", sorted_optimal_transportation_strategy)
We are spending $4726.669324146992 on manufacturing. Our defect rate is 227.71579927396098%. Recommended Transportation Strategy: Consolidating our shipments to reduce shipping costs. Negotiating better shipping rates with our carriers. Using more efficient shipping carriers. Using more sustainable transportation methods.
After we gain insight by answering a number of questions regarding each of the information provided by the data, then we need to do a descriptive analysis to answer some follow-up questions to find out the relationship of each variable that we hope can get deeper insights from this data.
# Here to analyze the relationship between the product's price and the revenue generated
fig = px.scatter(data, x='Price',
y='Revenue generated',
color='Product type',
hover_data=['Number of products sold'],
trendline='ols')
fig.show()
The visualization shows that companies earn more revenue from skincare product lines, with a linear detail where the higher the price of skincare the higher the revenue they can get.
Next, we visualize the sales of product.
# Plotting into pie chart
sales_data = data.groupby('Product type')['Number of products sold'].sum().reset_index()
pie_chart = px.pie(sales_data, values='Number of products sold',
names='Product type',
title='Sales by Product Type',
hover_data=['Number of products sold'],
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
pie_chart.update_traces(textposition='inside',
textinfo='percent+label')
pie_chart.show()
45% of total sales came from skin care, 29.5% from hair care, and 25.5% from cosmetics.
Next, we will see how the distribution of revenue generated by shipping carriers.
total_revenue = data.groupby('Shipping carriers')['Revenue generated'].sum().reset_index()
fig = go.Figure()
fig.add_trace(go.Bar(x=total_revenue['Shipping carriers'],
y=total_revenue['Revenue generated']))
fig.update_layout(title='Total Revenue by Shipping Carrie',
xaxis_title= 'Shipping Carrier',
yaxis_title='Revenue Generated')
fig.show()
After observed from the data above, from three available carriers, carrier B is a shipping carrier that helps companies earn more revenue.
Now let's have a work to get to know the Average lead time and Average Manufacturing Costs for all products of the company.
# Here to see how is the average lead time and mfg. costs look like
lead_time_avg = data.groupby('Product type')['Lead time'].mean().reset_index()
mfg_costs_avg = data.groupby('Product type')['Manufacturing costs'].mean().reset_index()
# Define and show the result
result = pd.merge(lead_time_avg, mfg_costs_avg, on='Product type')
result.rename(columns={'Lead time': 'Average Lead Time', 'Manufacturing Costs': 'Average Manufacturing Costs'}, inplace=True)
print(result)
Product type Average Lead Time Manufacturing costs 0 cosmetics 13.538462 43.052740 1 haircare 18.705882 48.457993 2 skincare 18.000000 48.993157
Let's continue to analyze the distribution of each SKU in terms of revenue generated, stock level and order quantity.
# Number of SKU x revenue generated
revenue_chart = px.line(data, x='SKU', y='Revenue generated', title='Revenue generated by SKU')
revenue_chart.show()
# Number of SKU x stock levels
stock_chart = px.line(data, x='SKU', y='Stock levels', title='Stock levels by SKU')
stock_chart.show()
# Number of SKU x order quantity
order_quantity_chart = px.bar(data, x='SKU', y='Order quantities', title='Order Quantity by SKU')
order_quantity_chart.show()
As part of analyzing cost, we want to see the cost distribution in terms of shipping carriers and transportation mode.
# Distrubution shipping carriers x shipping costs
shipping_cost_chart = px.bar(data, x='Shipping carriers',
y='Shipping costs',
title='Distribution of Shipping Costs by Carriers')
shipping_cost_chart.show()
From the previous process by analyze the shipping carriers revenue generated, we've seen that carrier B was the highest top choice and helped company to get more revenue, but also as shown on the above visualization, carrier B also the most costly carrier among the three choices.
How about the cost distribution by transportation mode?
transport_chart = px.pie(data, values='Costs',
names='Transportation modes',
title='Cost Distribution by Transportation Modes',
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
transport_chart.show()
The company spends more on land routes, with road and rail modes. The distribution shows that 30.3% of costs are incurred by road mode, and 28.7% by rail mode.
Next we will analyze the level of defects. This is important to analyze, because in every shipment of goods or products, sometimes we found the defect one. We will analyze the average defect rate for each product, and which mode of transportation tends to occur frequently.
defect_rates_by_product = data.groupby('Product type')['Defect rates'].mean().reset_index()
# Ploting into bar chart
fig = px.bar(defect_rates_by_product,
x='Product type',
y='Defect rates',
title='Average Defect Rates by Product Type')
fig.show()
# Defect rates bt transportation mode
pivot_table = pd.pivot_table(data, values='Defect rates',
index=['Transportation modes'],
aggfunc='mean')
# Plotting into pie chart
transportation_chart = px.pie(values=pivot_table['Defect rates'],
names=pivot_table.index,
title='Defect Rates by Transportation Modes',
hole=0.5,
color_discrete_sequence=px.colors.qualitative.Pastel)
transportation_chart.show()
From the visualization shown above, the defect rate that occurs on average for haircare product lines has the highest level. Meanwhile, the land route, or more precisely the road, is the mode of transportation that tends to have the highest rate of defect.
Following are some important findings after carrying out the entire analysis process:
Earlier this project we already define some questions to be answer through this whole project. So, Here's to summarize it all by answering those questions and provide some recommendation regarding this topic:
Answer: Companies can focus more on maintaining and developing products from lines that are most popular in the eyes of consumers and those that have high profits, the answer is the Skincare product line.
Answer: Data shows Suppliers 1 and 2 are the top 2 suppliers with the highest costs, it would be better to consider other suppliers (supplier 3 for example) which have more affordable costs, apart from being able to incur lower costs of course the company can make savings.
Answer: By looking at the available data, there are several opportunities that companies can use to improve their manufacturing processes. Such as optimizing processes by using more cost-effective materials. Work with suppliers to find ways to improve ingredient quality, reduce shipping costs, or increase delivery times. Consolidate with distributors in the distribution of goods to arrive on time and reduce the level of defects in goods.
Answer: Several ways can be done to reduce shipping costs, we can do that by using multiple carriers, consider in using consolidated shipping, mind use dimensional weight pricing, ship during off-peak hours, ensure that the goods are handled and packaged securely to avoid defects caused by the shipping process, and last but not least use free shipping if available.
Answer: Ways that can be done to improve service to customers are investing in training, this can be train the people (employee) or other stakeholders, to make sure they trained well to produce excellent product. Gathering feedback from customers, by doing this one, can help company identify areas where they can improve the customer service. Monitoring customer service metrics, by tracking customer satisfaction scores and complaint rates which this will help to measure the effectiveness of the customer service efforts. If needed, use technology as an advantage, this can be done by developing new technology that aimed to direct connection and respons to the customer such as, chatbots, live chat and social media monitoring.